Bring back `SMC` and allow `prior_predictive_sampling` to return transformed values #4769

ricardoV94 · 2021-06-14T15:17:10Z

SMC was broken after the refactoring because it starts with a prior_predictive_sampling call to set up the particles positions, expecting it to return also transformed values. I extended prior_predictive to return transformed values if (and only if) transformed variables are explicitly passed in the optional var_names argument. For reference, in v3 transformed variables were returned by default. If anyone has a strong opinion about the old default let me know!

Tests were added for this as well as for the now stale issue #4490

Depending on what your PR does, here are a few things you might want to address in the description:

what are the (breaking) changes that this PR makes?
important background, or details about the implementation
are the changes—especially new features—covered by tests and docstrings?
linting/style checks have been run
consider adding/updating relevant example notebooks
right before it's ready to merge, mention the PR in the RELEASE-NOTES.md

junpenglao

LGTM

michaelosthege

These failing tests look systematic. Some kind of dytpe problem..

ricardoV94 · 2021-06-15T13:23:15Z

These failing tests look systematic. Some kind of dytpe problem..

Definitely. One of the SMC tests is failing in float32 when there is a discrete variable:

https://github.com/pymc-devs/pymc3/blob/57e4f5a177c98a928c3e590800d1a17af4237b50/pymc3/tests/test_smc.py#L67-L72

It happens in the join_nonshared_inputs which concatenates all the raveled variables, leading to an upcast to float64 when there are both discrete and continuous variables (since discretes are int64).

https://github.com/pymc-devs/pymc3/blob/f7d460212c0539e6c7a7ae394a3f2de6416068c7/pymc3/aesaraf.py#L604-L608

The problem then is that the input can no longer be of type float32 / floatX. I can make the test pass by just wrapping joined in a pm.aesaraf.floatX, confirming the problem lies there. I don't think this is a great fix though...

This could be a problem in other areas of the codebase that make use of this function. I've seen it in metropolis.py, mlda.py and pgbart.py.

Edit: Possibly related to #4553

Edit2: Link to the failing test: https://github.com/pymc-devs/pymc3/runs/2821726372?check_suite_focus=true#step:7:599
(I canceled the last workflow, I had just pushed a rebase and some minor comments, so the tests would still fail)

ricardoV94 · 2021-06-15T13:36:49Z

Also, more in general, within SMC we are treating discrete variables as continuous (e.g., in the proposal). Are we comfortable with this @aloctavodia? I know that the logp methods deal fine with float inputs, but it still feels strange to feed them non-rounded values.

ricardoV94 · 2021-06-16T07:52:30Z

I temporarily disabled trust_input to pass the tests. I tried a bunch of things (mostly with clone_replace + casting) to allow the prior_logp_func function to accept float32 inputs in models with discrete variables, without success.

Edit: I restricted this change to when absolutely needed, and issued an informative UserWarning in those cases.

aloctavodia · 2021-06-17T07:07:58Z

Sorry for being late to the party. Yes I am conformable with treating discrete variables as continuous when proposing new values.

OriolAbril · 2021-11-08T14:35:43Z

I saw this checking the release notes in:

pm.sample_prior_predictive no longer returns transformed variable values by default. Pass them by name in var_names if you want to obtain these draws (see 4769).

The converter to InferenceData ignores transformed values by default, so I find the phrasing is a bit misleading and potentially troublesome. We should probably add an argument to the converter to include transformed variables into the inferencedata otherwise we'll need to keep the dict return and the capabilities of the function will depend on the output chosen

ricardoV94 · 2021-11-08T14:40:27Z

The plan is to revert this, see #5076

This is no longer needed, as we can use the model.initial_point to get transformed prior predictive samples when they are needed for our samplers

OriolAbril · 2021-11-08T14:55:59Z

I think I should probably go over issues and PRs in both pymc and arviz and make a serious cleanup of integration with InferenceData, but I don't think I'll have time for a while. We still have arguments that were more workarounds than actual arguments/fixes and should be removed as they are generally useless now (i.e. density_dist_obs in to_inferende_data, keep_size in sample_posterior_predictive), the transforms presence is also annoying: arviz-devs/arviz#1509, arviz-devs/arviz#230, and in general we can simplify the converter quite a bit now that it lives in the pymc codebase and should not need complicated logic to work with multiple pymc versions. I think we could also make pointwise log likeihood storage and posterior predictive sampling work with dask (as in my experience it is common that the model/posterior fits in memory but there are many observations and ll and pp do not, and arviz does support working with dask backed arrays, the main limitation right now is creating those dask backed arrays).

Maybe other improvements are also relaatively low hanging fruit?

ricardoV94 force-pushed the smc_v4_compat branch 2 times, most recently from ee122d0 to 57e4f5a Compare June 14, 2021 16:15

junpenglao approved these changes Jun 14, 2021

View reviewed changes

michaelosthege approved these changes Jun 15, 2021

View reviewed changes

michaelosthege requested changes Jun 15, 2021

View reviewed changes

ricardoV94 added 4 commits June 15, 2021 15:46

Enable prior_predictive to return transformed values

687f044

Add test which closes pymc-devs#4490

2eb3193

Fix SMC regression and re-enable test_smc.py

4a731a5

Minor changes to the pytest.yml comments

22da46a

ricardoV94 force-pushed the smc_v4_compat branch from 57e4f5a to 22da46a Compare June 15, 2021 13:48

Add workaround for floatX == 'float32' and discrete variables

0ffdf22

ricardoV94 force-pushed the smc_v4_compat branch from 9ebfb4e to 0ffdf22 Compare June 16, 2021 08:15

ricardoV94 requested review from michaelosthege and junpenglao June 16, 2021 14:46

junpenglao approved these changes Jun 16, 2021

View reviewed changes

michaelosthege approved these changes Jun 16, 2021

View reviewed changes

ricardoV94 merged commit a90457a into pymc-devs:main Jun 16, 2021

ricardoV94 deleted the smc_v4_compat branch June 17, 2021 06:29

eigenfoo mentioned this pull request Jun 28, 2021

New commits to pymc3/sampling.py or pymc3/step_methods/hmc/ eigenfoo/littlemcmc#113

Closed

ricardoV94 mentioned this pull request Jul 8, 2021

Consistent API for choosing model RVs / value vars #4846

Open

ricardoV94 added the SMC Sequential Monte Carlo label Aug 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bring back `SMC` and allow `prior_predictive_sampling` to return transformed values #4769

Bring back `SMC` and allow `prior_predictive_sampling` to return transformed values #4769

ricardoV94 commented Jun 14, 2021

junpenglao left a comment

michaelosthege left a comment

ricardoV94 commented Jun 15, 2021 •

edited

Loading

ricardoV94 commented Jun 15, 2021 •

edited

Loading

ricardoV94 commented Jun 16, 2021 •

edited

Loading

aloctavodia commented Jun 17, 2021

OriolAbril commented Nov 8, 2021

ricardoV94 commented Nov 8, 2021 •

edited

Loading

OriolAbril commented Nov 8, 2021 •

edited

Loading

Bring back SMC and allow prior_predictive_sampling to return transformed values #4769

Bring back SMC and allow prior_predictive_sampling to return transformed values #4769

Conversation

ricardoV94 commented Jun 14, 2021

junpenglao left a comment

Choose a reason for hiding this comment

michaelosthege left a comment

Choose a reason for hiding this comment

ricardoV94 commented Jun 15, 2021 • edited Loading

ricardoV94 commented Jun 15, 2021 • edited Loading

ricardoV94 commented Jun 16, 2021 • edited Loading

aloctavodia commented Jun 17, 2021

OriolAbril commented Nov 8, 2021

ricardoV94 commented Nov 8, 2021 • edited Loading

OriolAbril commented Nov 8, 2021 • edited Loading

Bring back `SMC` and allow `prior_predictive_sampling` to return transformed values #4769

Bring back `SMC` and allow `prior_predictive_sampling` to return transformed values #4769

ricardoV94 commented Jun 15, 2021 •

edited

Loading

ricardoV94 commented Jun 15, 2021 •

edited

Loading

ricardoV94 commented Jun 16, 2021 •

edited

Loading

ricardoV94 commented Nov 8, 2021 •

edited

Loading

OriolAbril commented Nov 8, 2021 •

edited

Loading